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A  Connectionist  Approach  to  Producing  Rules 
Describing  Monthly  UK  Divisia  Data 


Vincent  A.  Schmidt 
Air  Force  Research  Laboratory 
Dayton,  Ohio  USA 


Jane  M.  Dinner 
Aston  University 
Birmingham,  UK 


Abstract 

This  paper  demonstrates  a  mechanism  whereby 
rules  can  be  extracted  from  a  feedforward  neural 
network  trained  to  characterize  the  money- 
price  relationship,  defined  as  the  relationship 
between  the  rate  of  growth  of  the  money  supply 
and  inflation.  Monthly  Divisia  component 
data  is  encoded  and  used  to  train  a  group  of 
candidate  connectionist  architectures.  One 
candidate  is  selected  for  rule  extraction,  using 
a  custom  decompositional  extraction  algorithm 
that  generates  rules  in  human-readable  and 
machine-executable  form.  Rule  and  network 
accuracy  are  compared,  and  comments  are  made 
on  the  relationships  expressed  within  the  discov¬ 
ered  rules.  The  types  of  discovered  relationships 
could  be  used  to  guide  monetary  policy  decisions. 

Keywonis:  Divisia,  Inflation,  Neural  Net¬ 
work,  Data  Mining,  Rule  Generation 

1  Introduction 

In  recent  years  the  relationship  between  “money” 
and  the  macroeconomy  has  assumed  prominence 
in  academic  literature  and  in  Central  Banks’ 
circles.  Although  some  Central  Bankers  have 
stated  they  have  formally  abandoned  the  notion 
of  using  monetary  aggregates  as  indicators  of  the 
impact  of  their  policies  on  the  economy,  research 
into  the  link  between  some  kind  of  monetary 
aggregate  and  the  price  level  is  still  prevalent. 
Attention  is  increasingly  turning  to  the  method 
of  aggregation  employed  in  the  construction  of 
monetary  indices.  The  most  sophisticated  index 
number  used  thus  far  relies  upon  the  formulation 
devised  by  Divisia  [1],  with  roots  firmly  based  in 


microeconomic  aggregation  theory  and  statisti¬ 
cal  index  number  theory. 

Our  hypothesis  is  that  measures  of  money 
constructed  using  the  Divisia  index  number  for¬ 
mulation  are  superior  indicators  of  monetary 
conditions  when  compared  to  their  simple  sum 
counterparts.  Our  hypothesis  is  reinforced  by  a 
growing  body  of  evidence  from  empirical  stud¬ 
ies  around  the  world  which  demonstrate  that 
weighted  index  number  measures  may  be  able  to 
overcome  the  drawbacks  of  the  simple  sum,  pro¬ 
vided  the  underlying  economic  weak  separability 
and  linear  homogeneity  assumptions  are  satis¬ 
fied.  Ultimately,  such  evidence  could  reinstate 
monetary  targeting  as  an  acceptable  method  of 
macroeconomic  control,  including  price  regula¬ 
tion. 

The  theoretical  case  for  weighted  monetary 
aggregates  never  has  been  challenged  seriously. 
Their  potential  for  use  in  practice,  however,  has 
been  questioned  on  three  fronts.  First,  criti¬ 
cisms  about  the  choice  of  a  benchmark  rate  of 
return  and  the  treatment  of  risk  when  measuring 
monetary  user  costs  (both  of  which  affect  index 
weights)  sugg^t  that  such  an  index  is  subject  to 
unknown,  and  presumably  large,  measurement 
error.  Second,  if  the  money  stock  were  mear 
sured  as  the  sum  of  its  components,  with  each 
weighted  by  its  share  of  total  expenditures  on 
monetary  services,  it  has  been  alleged  (without 
evidence)  that  central  banks  would  be  unable  to 
influence  the  behaviour  of  such  an  index  in  the 
pursuit  of  a  monetary  policy  objective.  Most 
commonly,  however,  the  case  against  the  con¬ 
struction,  publication,  and  use  of  any  superla¬ 
tive  index  of  money  has  been  grounded  in  empir¬ 
ical  evidence  showing  that  an  oflicial  simple  sum 
measure,  in  the  context  of  a  particular  model, 
time  period,  or  set  of  tests,  performs  as  well  as 
or  better  than  a  weighted  index  of  the  same  asset 
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collection.  In  sum,  these  perceived  shortcomings 
have  led  most  monetary  economists  and  policy- 
makers  to  conclude  that  the  practical  difficul¬ 
ties  associated  with  finding  empirical  proxies  for 
a  weighted  indexes  theoretical  components  and 
explaining  the  behaviour  of  such  an  index  to  au¬ 
thorities  who  monitor  central  bank  actions  more 
than  offset  the  small  marginal  gains  (if  any)  from 
use  of  the  index  itself. 

This  paper  addresses  the  problem  of  how  best 
to  construct  monetary  aggregates,  given  the  ex¬ 
traordinary  debate  on  this  topic  in  the  macroe¬ 
conomics  literature.  A  useful  summary  for  11 
countries  is  provided  in  Belongia  and  Binner  [2]. 
Even  the  superlative  Divisia  monetary  aggre¬ 
gates  have  been  found  to  be  perform  less  than 
optimally  in  the  recent  past  using  monthly  US 
data  over  the  period  1960-2004  (see  [3]),  there¬ 
fore  guidance  on  improved  construction  of  the 
monetary  aggregates  is  a  vital  area  for  further 
research.  Our  policy  goal  in  this  paper  is  infla¬ 
tion,  the  current  focus  of  monetary  policy  tar¬ 
gets  in  the  UK  and  most  major  macroeconomies 
in  the  world  today. 

We  have  jointly  examined  various  aspects  of 
finding  relationships  in  quarterly  UK  Divisia 
data  for  several  years  (see  [4]  for  the  mc^t  recent 
UK  Divisia  report),  and  recently  applied  the 
same  models  (using  an  identical  construction 
approach)  successfully  to  the  US’s  MSI  data  [5]. 
Our  work  together  began  in  2002  with  the  use 
of  a  specialized  feedforward  neural  model  tightly 
coupled  with  a  custom  decompositional  rule  ex¬ 
traction  algorithm.  These  initial  efforts  yielded 
exciting  results  as  a  proof  of  concept,  but  the 
rules  were  both  numerous  and  complex.  As 
our  research  continued,  we  were  able  to  demon¬ 
strate  the  discovery  of  interesting  relationships 
using  simpler  and  more  standard  feedforward 
connectionist  models,  and  using  a  newly  decou¬ 
pling  and  revised  rule  extraction  algorithm  fur¬ 
ther  simplified  and  reduced  the  number  of  rules. 
(These  rules  are  still  automatically  produced  as 
a  collection  of  MATLAB-based  human-readable 
and  machine-executable  if-then  rules,  express¬ 
ing  the  discovered  relationships  in  terms  of  the 
original  data.) 

This  year  we  are  able  to  use  monthly  (vs. 
quarterly)  Divisia  data  due  to  its  availability, 
and  the  complexity  and  quantity  of  the  gener¬ 
ated  rules  is  reduced  even  further.  This  paper 
describes  our  experimentation  with  the  latest  set 
of  monthly  UK  Divisia  data,  and  compares  these 


models  and  results  briefly  with  those  of  the  quar¬ 
terly  Divisia  work  and  our  recent  departure  into 
the  US  MSI. 

2  Dataset  Preparation 

Historical  UK  Divisia  M4  and  corresponding  in¬ 
flation  data  was  obtained^  in  order  to  investi¬ 
gate  the  relationship  between  money  supply  and 
inflation.  The  training  data  used  for  connec¬ 
tionist  model  selection  included  monthly  season¬ 
ally  adjusted  values  from  January  1988  through 
September  2007,  a  total  of  117  exemplars.  In¬ 
flation  was  constructed  for  each  month  as  year- 
on-year  growth  rates  of  prices.  Our  preferred 
price  series,  the  Consumer  Price  Index  (CPI), 
was  obtained  from  DataStream.  The  CPI  data 
originated  from  the  Office  for  National  Statistics 
(ONS)  and  all  data  was  seasonally  adjusted. 

The  data  was  prepared  by  calculating  the  per¬ 
centage  of  increase  in  value  for  corresponding 
months  in  consecutive  years.  This  reduced  the 
dataset  to  105  exemplars.  The  automated  clus¬ 
tering  algorithm  we’ve  used  in  previous  studies 
(to  examine  quarterly  Divisia  data)  was  used 
again  this  time  to  discretize  the  monthly  com¬ 
ponent  data.  Components  were  represented  us¬ 
ing  thermometer  encoding.  After  inspection,  in¬ 
flation  was  manually  discretized  into  3  distinct 
ranges: 

•  inflation  %  changed  <  0.010 

•  inflation  %  changed  0.010  —  0.020 

•  inflation  %  changed  >  0.020 

Inflation  was  encoded  using  mutex  (1-of-N)  en¬ 
coding.  These  two  encoding  schemes  were  se¬ 
lected  based  on  the  successful  training  of  quar¬ 
terly  Divisia  data  in  our  past  work.  (Mutex  and 
thermometer  encoding  schemes  are  commonly 
used  to  prepare  discretized  data  for  neural  net¬ 
work  consumption.) 

Table  1  summarizes  the  components  and  en¬ 
coding  levels  generated  for  each  component,  as 
well  as  for  inflation.  The  table  identifies  the  type 
of  asset  (component  name),  the  component  ID 
#  and  symbol  used  to  represent  the  component 
for  this  study,  the  type  of  encoding  used,  and 
the  number  of  levels  of  the  discretized  component. 

^Component  data  is  available  on  the  Internet  at 
http:  /  /w  ww  .bankofengland ,  co.uk/ statistics/index,  htm 
(Bank  of  England  Statistical  Interactive  Database). 
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Table  1:  Divisia  M4  Encodinj 

s _ 

Component  (Attribute) 

ID  # 

Symbol 

Encoding 

Levels 

Notes  and  Coins 

1 

NC 

Thermometer 

2 

Non-Interest  Bearing  Bank  Deposits 

2 

NIBD 

Thermometer 

5 

Interest  Bearing  Bank  Sight  Deposits 

3 

IBSD 

Thermometer 

14 

Interest  Bearing  Bank  Time  Deposits 

4 

IBTD 

Thermometer 

9 

Building  Society  Deposits 

5 

BSD 

Thermometer 

2 

ISA  and  TESSA  (tax-free  savings) 

6 

ISA 

Thermometer 

6 

Inflation 

N/A 

INFL 

Mutex 

3 

When  all  components  are  used  as  inputs  to  the 
neural  network,  there  are  38  binary-valued  in¬ 
puts  to  the  network,  and  3  binary-valued  out¬ 
puts  (repr^enting  inflation). 

3  Neural  Network  Selection 

A  series  of  carefully  controlled  tests  were  per¬ 
formed  to  determine  the  best  type  of  simple 
feedforward  connectionist  models  to  use  for  rule 
generation.  The  105  data  cases  were  divided  at 
random  into  a  training  set  (80%,  84  cases)  and 
a  validation  set  (20%,  21  cases).  The  break¬ 
out  was  examined  to  ensure  that  the  compo¬ 
nents  and  outputs  were  reasonably  represented 
in  both  the  training  and  validation  data.  This 
same  randomly  generated  selection  was  used  for 
each  test. 

The  results  of  these  tests  are  summarized  in 
the  following  tables.  The  tables  include  columns 
for  training  and  validation  ^‘success’’  expressed  as 
a  number  and  percentage  of  correct  outputs.  All 
network  architectures  were  trained  to  find  the 
INFL  target,  a  set  of  3  mutex-encoded  binary 
values  corresponding  to  each  data  case.  Train¬ 
ing  (and  validation)  success  was  measured  by 
determining  the  number  of  total  binary  matches 
for  all  training  (and  validation)  cases: 

•  Training:  (84  cases)x(3  outputs)  =  252 

•  Validation:  (21  cases)x(3  outputs)  =  63 

For  each  candidate  model  architecture  (table 
row),  500  models  with  randomly  generated  ini¬ 
tial  conditions  were  trained  for  2500  epochs 
each,  with  the  “best”  model  instance  selected  to 
represent  the  specified  model  architecture.  It  is 
important  to  note  that  “best”  is  a  somewhat  ar¬ 
bitrary  term  due  to  the  way  the  “best”  network  is 
selected  in  our  study.  When  numerous  networks 
with  discrete  outputs  are  trained,  they  tend  to 


fall  into  classes,  where  multiple  networks  yield¬ 
ing  identical  results  all  belong  to  the  same  class. 
We  simply  choose  a  network  from  the  class  of 
all  networks  yielding  the  most  accurate  training 
and  validation  results.  For  comparison,  the  ta¬ 
bles  below  indicate  how  many  clusters  the  500 
trained  networks  fall  into  for  each  architecture 
(“Net  Clusters”),  and  how  many  of  these  500  are 
members  of  the  “class  of  best  networks”  from 
which  our  selection  is  made  (“Qty”).  This  data 
is  intended  to  ease  the  minds  of  those  concerned 
with  our  selection  of  the  “best  network”  for  each 
architecture  as  we  proceed  with  our  analysis. 

No  instance  of  any  network  trained  for  longer 
than  6  seconds  on  the  experimental  machine,  a 
Slackware  10.1  Linux-based  (custom  SMP  2.6.13 
kernel)  dual  AMD  Opteron  244  system  with  2Gb 
RAM  running  Matlab  5.3  (Rll).  The  neural 
models  executed  in  this  study  contained  (except 
where  indicated  otherwise)  a  single  hidden  layer 
using  Matlab’s  logsig  function,  a  traditional 
sigmoid  activation  function.  The  nod^  in  the 
output  layer  use  the  unconstrained  linear  func¬ 
tion  (Matlab^s  purelin). 

Table  2  is  for  models  where  only  a  single  com¬ 
ponent  is  used  as  an  input  to  a  network  model 
with  5  nodes  in  the  hidden  layer.  Components  4 
(IBTD)  and  6  (ISA)  seem  to  be  the  most  reliable 
individual  indicators  of  inflation,  based  on  their 
training  accuracy  of  over  82%,  but  neither  com¬ 
ponent  has  an  overwhelmingly  good  validation 
accuracy  (77-80%). 

We  also  trained  a  series  of  models  where  the 
inputs  lead  directly  to  the  outputs  (no  hidden 
layer).  Surprisingly,  these  training  and  valida¬ 
tion  results  (not  shown  here)  were  almost  iden¬ 
tical  to  the  models  trained  with  a  single  hidden 
layer  of  5  nodes  as  shown  in  Table  2. 

As  a  quick  test  of  sensitivity  analysis,  we  also 
trained  a  series  of  models  where  all  components 
except  for  one  specific  component  were  included 
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Table  2:  Single  Components  bs  Inputs  to  ii-5-3  FF  Networks 


ID 

Inputs 

Train  (of  252) 

Valid,  (of  63) 

Net  Clusters 

Qty  in  Best  Cluster  (of  500) 

1 

2 

198  /  78.57  % 

45  /  71.43  % 

1 

500 

2 

5 

196  /  77.78  % 

45  /  71.43  % 

4 

317  (148,  34)* 

3 

14 

203  /  80.56  % 

49  /  77.78  % 

5 

1  (276,  183)* 

4 

9 

216  /  85.71  % 

51  /  80.95  % 

2 

277  (223)* 

5 

2 

196  /  77.78  % 

45  /  71.43  % 

1 

500 

6 

6 

208  /  82.54  % 

49  /  77.78  % 

1 

500 

*  Quantity  in  next  best  clu$ter(s)  shown  in  parenth^es  for  reference 


as  inputs*  The  rows  of  Table  3  identify  the  com¬ 
ponent  not  included  as  inputs  to  the  network. 
All  network  models  have  5  nodes  in  their  hidden 
layer  for  this  series  of  tests. 

A  more  traditional  approach  was  also  taken; 
a  collection  of  models  was  trained  with  various 
numbers  of  nodes  in  the  hidden  layer.  Table  4 
reflects  the  results  of  these  tests,  all  of  which 
use  all  6  components  as  inputs  to  the  network 
(38  encoded  values  in  each  input  vector). 

The  goal  of  training  various  architectures  is 
to  find  an  appropriate  model  from  which  a  col¬ 
lection  of  human-readable  rules  can  be  gener¬ 
ated  to  accurately  describe  the  dataset.  Exper¬ 
imental  models  using  only  a  single  component 
as  input,  either  with  or  without  a  hidden  layer, 
was  not  convincingly  accurate  enough  to  justify 
continuing  with  these  simpler  models.  Using  all 
but  one  component  showed  promise,  but  didn’t 
really  suggest  any  specific  component  could  be 
easily  eliminated. 

The  use  of  all  components  as  inputs  in  the 
model  consistently  yielded  the  best  results. 
Note,  however,  the  excellent  r^ults  the  neural 
model  containing  no  elements  in  the  hidden  layer 
(the  row  with  H  as  “0”  in  the  H  column  of  Ta¬ 
ble  4),  hence  no  hidden  layer.  These  results  are 
nearly  as  good  as  models  containing  10  nodes  in 
the  hidden  layer! 

4  Rule  Generation 

Since  most  of  the  models  in  Table  4  had  a  high 
degree  of  accuracy  in  training  and  validation, 
we  chose  to  do  rule  extraction  on  the  simplest 
model,  the  network  containing  a  hidden  layer 
with  the  fewest  (non-zero)  number  of  nodes:  the 
38-2-3  model  (2  nodes  in  the  hidden  layer).  This 
network  is  depicted  in  Figure  1. 

Metrics  were  collected  during  rule  extraction 
in  order  to  verify  the  rules  would  be  a  faithful 


Figure  1:  Architecture  Selection:  38-2-3 

reproduction  of  the  relationships  learned  by  the 
network.  One  intermediate  test  ran  an  exhaus¬ 
tive  combination  of  all  possible  discrete  inputs 
through  the  neural  network,  examining  the  dif¬ 
ferences  in  output  produced  by  the  network  and 
the  rule  generation  process.  Table  1  indicates 
the  number  of  discrete  bins  for  each  input.  The 
total  number  of  all  combinations  of  possible  in¬ 
puts  is  a  simple  product  of  these  values:  2*5 
*14  *9*2*6  =  15120.  Looking  at  each  of 
the  three  output  nod^  individually,  the  accu¬ 
racy  (compared  to  providing  the  same  inputs  to 
the  neural  network)  of  the  intermediate  rule  ex¬ 
traction  is  (in  number  of  mismatches  per  15120): 

•  Node  1:  6  mismatches,  99.96%  match 

•  Node  2:  21  mismatches  99.86%  match 

•  Node  3:  5  mismatches,  99.97%  match 

Although  this  check  is  merely  a  quick  test  taken 
during  the  extraction  exercise,  it  is  encouraging 
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Table  3:  Single  Compooents  Excluded  as  Inputs  to  n-5-3  FF  Networks 


ID 

Inputs 

Train  (of  252) 

Valid,  (of  63) 

Net  Clusters 

Qty  in  Best  Cluster  (of  500) 

1 

36 

244  /  96.83  % 

61  /  96.83  % 

16 

1  (4,  15)* 

2 

33 

244  /  96.83  % 

62  /  98.41  % 

16 

2  (7,  12)* 

3 

24 

234  /  92.86  % 

61  /  96.83  % 

8 

8  (110,  117)* 

4 

29 

228  /  90.48  % 

63  /  100.0  % 

15 

1  (2,  6)* 

5 

36 

244  /  96.83  % 

61  /  96.83  % 

16 

2  (5,  25)* 

6 

32 

236  /  93.65  % 

58  /  92.06  % 

17 

2  (3,  6)* 

*  Quantity  in  next  best  clusters  shown  in  parentheses  for  reference 


Table  4:  All  Components^  Variable  Hidden  Layer  Nodes  in  38~n«3  FF  Networks 


H 

Train  (of  252) 

Valid,  (of  63) 

Net  Cltisters 

Qty  in  Best  Cluster  (of  500) 

0 

244  /  96.83  % 

60  /  95.24  % 

6 

16  (88,  203)* 

2 

242  /  96.03  % 

61  /  96.83  % 

15 

6  (4,  43)* 

3 

242  /  96.03  % 

62  /  98.41  % 

17 

2  (6,  3)* 

4 

244  /  96.83  % 

62  /  98.41  % 

16 

2  (4,  7)* 

5 

244  /  96.83  % 

62  /  98.41  % 

17 

1  (3.  9)* 

6 

244  /  96.83  % 

63  /  100.0  % 

18 

1  (2,  6)* 

7 

244  /  96.83  % 

62  /  98.41  % 

16 

1  (5,  10)* 

8 

244  /  96.83  % 

63  /  100.0  % 

16 

1  (1,  8)* 

9 

244  /  96.83  % 

62  /  98.41  % 

17 

1  (4,  10)* 

10 

244  /  96.83  % 

61  /  96.83  % 

14 

3  (11,  29)* 

Quantity  in  next  best  clusters  shown  in  parentheses  for  reference 


to  see  the  high  correlation  between  the  trained 
network  and  the  “intermediate”  extracted  rules. 

The  rule  extraction  technique  applied  for  this 
research  is  a  traditional  decompositional  ap¬ 
proach,  peering  back  through  the  trained  net¬ 
work  with  an  emphasis  on  the  values  dynami¬ 
cally  generated  by  the  hidden  nodes.  For  each 
hidden  node,  all  values  are  automatically  clus¬ 
tered,  and  a  representative  (mean)  value  is  as¬ 
signed  to  each  cluster.  All  combinations  of  these 
mean  values  are  evaluated  against  the  output 
node  weights  to  determine  combinations  (“can¬ 
didate  expressions”)  producing  the  desired  out¬ 
puts.  These  candidate  expressions  are  simplified 
and  re-expressed  as  simple  rules  in  terms  of  the 
original  network  inputs.  This  is  the  same  rule 
extraction  algorithm  we  devised  for  our  previ¬ 
ous  Divisia  and  US  MSI  research  efforts,  based 
on  the  algorithm  originally  described  in  Schmidt 
and  Chen  [6]. 

The  automated  binning  algorithm  originally 
separated  INFL  outputs  into  fifteen  bins,  but 
we  artificially  re-binned  the  outputs  into  three 
groups.  This  is  more  consistent  with  the  auto¬ 
mated  results  from  our  previous  research,  and 
still  yields  an  excellent  mix  of  potential  outputs 
among  the  bins.  The  bins  for  these  outputs  are: 


•  Node  1:  (-cx)  ...  0.01) 

•  Node  2:  (0.01  ...  0.02) 

•  Node  3:  (0.02  ...  oo) 

The  rule  generator  produces  rules  describing 
each  range  separately,  so  each  rule  file  corre¬ 
sponds  to  a  specific  output  node,  representing 
a  specific  range  of  output  values.  (The  current 
generation  algorithm  generously  allows  bound¬ 
ary  conditions  between  two  nodes  to  be  repre¬ 
sented  in  both  rulesets.)  Note  that  these  rules 
are  expressed  in  terms  of  the  original  input  val¬ 
ues  for  readability,  verses  the  encoded  forms. 

Each  file  contains  a  list  of  rules,  numbered  for 
reference  by  human  readers  for  convenience.  If 
some  combinations  of  attribute  values  (nc,  nibd, 
etc.)  is  described  by  any  rule  in  a  specific  file, 
those  values  would  be  expected  to  result  in  the 
"inflation  increase  represented  by  that  node. 
Le.,  all  rules  in  the  “node  2”  output  file  describe 
conditions  producing  inflation  increases  in  the 
range  (0.01  ...  0.02)  %. 

Each  line  in  a  rule  is  formatted:  (low_value 
attr  &  attr  <=  high _ value),  the  mathe¬ 
matical  equivalent  to:  low_value  <=  attr  <~ 
high_value.  The  symbols  and  “|”  are  logical 
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if  ( 

i-lnf  <»  nc  k  nc  <=  0.091411)  ... 
k  (-Inf  <«  nibd  k  nibd  <*'  0.447313)  ... 
k  (  (0.076963  <»  ibsd  k  ibad  <*  0.106028)  . 

I  (0.116952  <»  ibsd  k  ibsd  <»  0.122060)) 

k  (0.122171  <=  ibtd  k  ibtd  <=  0.190660)  ... 
k  (-Inf  <*  bsd  k  bad  <«  6,846814)  ... 

k  (-Inf  <-  isa  k  isa  <»  0.186993)  ... 

)  return  true; 


Figure  2:  Sample  Generated  Rule 


Table  5:  Generated  Rule  Accuracy 


Output 

Rule 

Draining  set 

Validation  set 

Node 

Qty 

(correct,  of  84) 

(correct,  of  21) 

1 

57 

82  (97.62%) 

20  (95.24%) 

2 

64 

79  (94.05%) 

19  (90.48%) 

3 

10 

81  (96.43%) 

21  (100%) 

and  “OR”  operations,  respectively,  and 
Inf  represents  infinity.  The  logic  of  the  rule  must 
evaluate  to  'TRUE”  for  the  rule  to  be  true.  If  a 
rule  does  not  include  an  attribute,  the  attribute 
is  not  required  for  the  given  rule. 

Figure  2  shows  an  example  of  a  rule  extracted 
from  our  trained  network.  The  example  clearly 
demonstrates  the  human-readable  format  and 
nature  of  extracted  rules.  This  makes  them  ideal 
for  validation  by  subject-matter  experts.  These 
rules  can  also  be  executed  as  code  and  applied 
to  new  data. 

Table  5  shows  the  number  of  rules  generated 
for  each  output  node.  The  original  unencoded 
training  and  validation  data  were  processed  by 
the  rule  files,  with  the  outputs  tested  against 
the  known  targets  for  each  dataset.  The  ta¬ 
ble  clearly  indicates  a  good  match  between  the 
learned  relationships  (rules)  and  the  actual  data. 

Once  again,  the  chief  value  provided  by  the 
rules  is  that  they  are  human-readable  and  can 
be  vetted  by  a  subject-matter  expert  (econome¬ 
trician,  in  this  case),  while  also  being  machine- 
executable. 

5  Interpretation 

The  generated  rules  in  all  three  output  files 
were  examined  by  one  of  the  authors,  a  subject- 
matter  expert  in  econometrics,  for  specific  ap¬ 
plications  in  economic  theory.  Although  the 
rules  are  expressed  as  executable  code,  they  were 
found  to  be  descriptive  and  easy  to  read. 

Inspection  of  the  rules  indicate  exactly  the 
same  trend  as  we  saw  in  our  analysis  of  US  MSI 


data:  higher  yielding  assets  have  higher  impact 
on  inflation  than  the  lower  yielding  assets,  which 
conforms  with  the  construction  of  Divisia  /  MSI 
aggregates.  This  adds  credence  to  the  argument 
that  we  should  construct  statistically  weighted 
aggregates  as  our  money  supply  measure. 

These  conclusions  continue  to  be  consistent 
with  our  own  previous  analysis  of  UK  Divisia 
quarterly  data,  our  recent  work  with  US  MSI 
data,  and  contemporary  published  results  from 
other  sources.  See  Barnett  [7]  and  Eiger  and 
Dinner  [8]  for  a  more  detailed  description  of  user 
costs  of  monetary  assets. 

There  are  also  some  interesting  relationship 
patterns  that  can  be  seen  from  a  simple  inspec¬ 
tion  of  the  resultant  rules.  Table  6  shows  the 
frequency  of  occurrences  of  components  within 
the  generated  rules.  Prom  the  Table,  all  of  the 
components  are  generally  important  inputs  for 
describing  the  learned  relationships.  ibSD  and 
IBTD  are  frequently  included  multiple  times  to 
describe  cases  when  «i|S[pL  %  increase  <  0.02” 
(the  first  two  columns  of  the  Table).  (See  Fig¬ 
ure  2  for  an  example  where  IBSD  is  referenced 
twice  in  the  same  relationship.)  In  addition,  the 
last  column  of  the  Table  shows  that  BSD  and 
ISA  are  only  important  about  half  of  the  time 
for  the  relationships  describing  “INFL  increase 
>  0.02.”  The  implications  of  these  results  will 
merit  closer  evaluation.  In  most  cases  the  rule 
complexity  is  fairly  low,  with  each  a>mponent 
being  mentioned  only  once  per  relationship. 

It  is  worth  noting  that,  for  the  monthly  Di¬ 
visia  data,  there  are  131  total  rules  across  three 
output  nodes  (57  +  64  +  10),  with  a  minimum 
accuracy  of  90%.  Our  most  recent  quarterly  Di¬ 
visia  experiments  yielded  714  rules  (96  +  256 
H-  282  +  80)  for  four  output  nodes.  The  best 
results  of  our  US  MSI  study  yielded  116  rules 
across  three  outputs,  but  accuracy  varied  be¬ 
tween  75%  and  92%  for  each  output. 

6  Conclusion 

The  goal  of  this  research  effort  was  to  gener¬ 
ate  rules  describing  the  relationship  of  monthly 
Divisia  component  data  as  it  applies  to  predic¬ 
tion  of  inflation.  A  collection  of  connect ion- 
ist  models  were  trained  to  learn  these  relation¬ 
ships,  then  a  representative  model  was  chosen 
for  rule  extraction.  The  successfully  generated 
rules  were  shown  to  be  reasonable  in  number,  ac¬ 
curate  with  respect  to  both  training  and  ralida- 
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Table  6:  Component  Frequency  of  Occurrence 


Component 

Name 

Output  1  (of  57) 
(INFL  <  0.01) 

Output  2  (of  64) 
(0.01  <  INFL  <  0.02) 

Output  3  (of  10) 
(0.02  <  INFL) 

NC 

57  /  100% 

58  /  91% 

10  /  100% 

NIBD 

57  /  100% 

64  /  100% 

9  /  90% 

IBSD 

87  /  153% 

92  /  144% 

10  /  100% 

IBTD 

64  /  112% 

79  /  123% 

10  /  100% 

BSD 

57  /  100% 

61  /  95% 

4  /  40% 

ISA/TESSA 

62  /  109% 

67  /  105% 

6/60% 

tion  data,  a  faithful  representation  of  the  trained 
neural  network,  and  (most  importantly)  easy  for 
econometric  experts  to  visually  examine.  These 
rales,  expressed  in  terms  of  the  original  unen¬ 
coded  data,  are  also  machine-executable  Mat- 
lab  code,  and  can  be  used  independently  of  the 
original  neural  network. 

The  data  used  in  this  series  of  experiments 
included  the  ISA/TESSA  term.  The  inclusion 
of  this  term  proved  to  be  a  valuable  addition 
to  the  five  terms  already  used  in  previous  mod¬ 
els.  The  seasonally  adjuste|d  monthly  data  also 
yielded  superior  modeling  results  and  rule  qual¬ 
ity  when  compared  to  the  seasonally  adjusted 
quarterly  Divisia  data  we’ve  used  in  the  past. 
The  results  we  report  in  this  series  of  experi¬ 
ments  is  also  consistent  with  other  contempo¬ 
rary  published  model  results. 

It  is  our  hope  that  techniques  such  as  the  one 
represented  here  can  be  commonly  employed  to 
provide  useful  inputs  for  prediction  and  control 
of  infiation.  Calibration  of  these  results  in  a 
large  scale  macro  model  would  still  be  an  in¬ 
teresting  route  to  pursue  to  determine  the  full 
extent  of  the  impact  and  implications  of  these 
rules  for  the  U.K.  economy. 
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